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Comparison of Six Examinations Given in Rhetoric 101 » 
at the University of Illinois, Fall, 1VG5 

At the end of Fall sexuester, 1965^ six- separate final examinations were 
administered to 2,545 students enrolled in Rhetoric 101, the basic English 
composition course required of all freshmen. Two of these examinations were 
the College Entrabce Examination Board (CEEB) Enplish Composition Tests . The 
other four were final examinations constructed by the Rhetoric department. 

The 2,545 students were a randomly selected sample out of the 4,100 students 
enrolled in the course. The tvjo forms of the CLLB tests were administered 
to randomly selected groups, each of which was to contain approximately 
1,000 students. The remaining students were administered the various “orms 
of the departmental examinations according to the regular final examination 
schedule. Each student, therefore, tool: only one of the objective tests. 

The CEEB English Composition Test is available in several forms, tv;o of 
which were used in the present study: WPL and ICPL 1. It is a one hour ob- 

jective test designed to assess indirectly a student's ability to ^^rite. The 
test has three parts: Part A measures correctness and effectiveness of ex- 

pression, Part B measures ability to organize ideas and materials, and Part 
C measures sensitivity to language. 

The four Rhetoric final examinations (H45, U59A, IE54, and IU34) are 
also one-hour objective tests designed to assess indirectly a student's 
ability to write. These tests have four sections: Section A - vocabulary. 

Section B - spelling, and Sections C and D - knowledge of what constitutes 
good usage and effectiveness in sentence construction. 
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The students’ answers ^;ere coded on DIGITEU answer sheets and then pro- 
cessed, yielding cards with all item information. The students’ responses 
(now on cards) xjere then processed by the lieasurement and Research Division's 
item analysis program x/hich provided the statistics necessary for making a 
comparison among the six tests . 

In Table 1 all the relevant test s istics obtained from the item a-- 
nalysis are presented fcr ench of the tests. 
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Test StatlstlcSlfor.the Six Rhetoric 101 Examinations 
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TiaB sC3.nd3.irc ciGvintions C^} j V3iri3ncGs (o ) , 3tid it3ij scoitg jr3ngG in*^ 
dic3ted tli3t the scores veried over a wider erea for the CEEB tests than for 
the Rhetoric tests. This could be interpreted as meaning that the CEEB tests 
are discriminating betx^een more students. 

The number of students (N) , and the number of items (k) , and the number 
of alternatives are pre^jonted for each test. 

The ske^Tuess measure indicates ho\r well the sample distribution compares 
to a normal one. If the high raw scores were more numerous than the 1 ot 7 
scores, then the distribution would be negatively skewed. On the other hand, 
if the low raw scores were more numerous than the high, then the distribution 
would be positively skev;ed. The I:PL and ICPL 1 distributions were the most 
nearly normal of the six. 

Kurtosis is used to measure the peakedness of a distribution. If the 
distributions 'were normal, kurtosis would be zero. If the distrll^..!- 
a higher peak than the normal, kurtosis would be positive. If the distribu-^ 
:ion had a lower peak than the normal,, kurtosis v?ould be negative. Here, 
again we found that the NPL and ICPL 1 distributions were the most nearly 
normal . 

The Kuder-Richardson Formula 21 (K-R 21) provides an estimate of the 
reliability of a single test from a single administration. As one can see 
from Table 1 , the two CEEB tests had the highest reliability coefficients , 
indicating that they v/ere -measuring ablln’t-'t 
were the departmental teats. 

The standard error of measurement is the degree to v;hich test score 
one standard deviation of tne mean) could vary in the total pop- 
ulation. The standard errors v?ere highest for the CEEB tests because the 
standard deviations of these tests were also higher. 
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The discrlninating povjer of an item is measured by the point-biserial 
correlation. The' point-biserial correlation is used when a dichotomous 
variable is to be related to a continuous variable. Here, the distribution 
of the dichotomous responses to an item are related to the distribution of 
test scores in order to see if the discriminating ability of the test score 
can be reflected in the item. Looking at the mean point biserials one can 
see that the CLLL tests were doing a better job of discriminating between the 
students taking each test than v/ere the Rhetoric tests. In order to see if 
these differences vjere significant an analysis of variance v;as run. Table 2 
presents the analysis of variance summary. Because the F value v;as signifi- 
cant, the Scheffe test was used to determine the source of significance. 

Table 3 gives the results of this test, here, the greatest source of dif- 
ference was between the tv. / GELD tests and the Rhetoric department tests. 

This reinforces the conclusion that the CELB tests are doing a better job 
of discriminating between students than v/ere the Rhetoric tests. 



Table 2 

Analysis of Variance Summary Table 



Source of Variation 


■ 1 

SS 


df 


ns 


F 


Treatments 


1.6G3 


5 


.3365340 


32,7352366 


llithin Treatments 


6.107 


594 


.0102C05 




Total 


7.719 


599 
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Table 3 

Scheffe Test for liultiple Comparisons on Point Biserial for 

The Six Tests 




At the end of the fall semester grades were reported for all rhetoric 
students. The grades of those students who had taken the CEEB and Rhetoric 
tests were correlated with their scores on the. respective tests. The results 
are presented In Table 4. 

Table 4 

Correlation of Total Score to Rhetoric 101 Grade 
Rhetoric Grade 



RPL 


-3810 


KPL 1 


.3765 


M49 


.5227 


V.59A 


.4748 


IR54 


.-f'441 


1054 


.5259 



O 
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Table 5 



Test of Significance Between Correlations of Total Score 

To Rhetoric 101 Grade 





i-]PL 


i:pL 1 


I14S 


p - *024 


\ 

p * .026 j 


U59A 


i:sD 


BSD 


IR54 




KSD 


IU54 


j P = .COl 


p - .001 

1 



The Rhetoric tests correlate laore highly with the grade than do the CERE 
tests. This may result because the Rhetoric department tests were designed 
specifically to test the objectives of the rhetoric course whereas the CEEB 
tests were designed on a national basis with certain national rhetoric ob- 
jectives in mind. Table 5 presents the results of the test of significance 
between the correlations in Table 4 . The tvjo CEEB test score— grade cor- 
relations were significantly different than the li49 and IW54 Rhetoric test 
score— grade correlations. This is understandable since the E49 and IW54 
tests correlate most highly with the course grade. 

The two forms of the CERE tests were correlated by parts to the 
Rhetoric grade for each student. The results are presented in Table 6. 

Tills correlation was done to determine whether any one part of the CEEB 
tests could be used as a substitute for the whole test. Part 1 of the 
lIPL test correlated most highly with the Rhetoric 101 grade. The addition 
of Parts 2 and 3 for KPL increases the correlation coefficient by .06 which 
normally would not justify the retention of these two parts of the test. 
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However, Part 1 accounts -for only lOZ of the variation -and adding- Parts 
2 and 3 adds an addltioiial 4% of th^ variation. For this reason one might 
want to keep all three parts of the test, .Per the KPL 1 test^ Part 3. 
correlated most highly v/ith the Rhetoric 101 grade ^ The addition of Parts 
1 and 2 for KPL 1 Increases the correlation by ,03, In, this case Part 3 
of KPL 1 accounts for 14% of the variation and Parts 1 and 2 add 2% of 
the variation. 



Table 6 



Correlation of Three Parts 
of CEEE Tests to 
Pvhetorlc Grade 



iiuiciple of 
Three Parts to 
Rhetoric Grade 



Vs ■ 

-fA-; 



I ERIC 



KPL 



Part 1 Part 2 Part 3 Grada 



Part 1 
Part 2 
Part 3 



-i--' 


Grade 


,3230 


.2970 


.2620 


A'*',. 

■h-h s' ' 
iV’ ' 

tVn* ’ * 




r,PL 1 








Part 1 


Part 2 


Part 3 Grade 




Fart 1 










Part 2 


.3127 








part 3 


.3103 


.2323 




a 


Grade 


.2600 


.1741 


.3746 



Part 3 
Part 1 
Part 2 





Part 1 


.3230 




Part 2 


.3593 


.4920 








Part 3 


.3831 


i .3849 .2030 
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In summary, it appeared that there x^ere considerable differences between 
the CELB and Rhetoric tests. The CELB tests had lower mean scores and mean 
difficulties with higher standard deviations and standard errors of measure- 
ment, pointing out that the students were being spread out over a larger range 
of scores when compared to the Rhetoric tests. The higher K-R 21 *s of the 
CEEE test indicated that they were more stable in what they were measuring 
and the higher mean point blserials indicated that the items x^ere doing a 
better job of discriminating on the basis of the total score than the Rhetoric 
tests • The Rhetoric tests on the other hand were more highly related to 
course grade indicating that they seemed to be measuring the putc^ome of the 
course more accurately. Ideally, the Rhetoric tests should be made more 
loanable or the CEEB tests made- more valid. 
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